|
The information bottleneck method is a technique in information theory introduced by Naftali Tishby et al. for finding the best tradeoff between accuracy and complexity (compression) when summarizing (e.g. clustering) a random variable X, given a joint probability distribution between X and an observed relevant variable Y. Other applications include distributional clustering, and dimension reduction. In a well defined sense it generalized the classical notion of minimal sufficient statistics from parametric statistics to arbitrary distributions, not necessarily of exponential form. It does so by relaxing the sufficiency condition to capture some fraction of the mutual information with the relevant variable Y. The compressed variable is and the algorithm minimizes the following quantity : where are the mutual information between and respectively, and is a Lagrange multiplier. == Gaussian information bottleneck == A relatively simple application of the information bottleneck is to Gaussian variates and this has some semblance to a least squares reduced rank or canonical correlation. Assume are jointly multivariate zero mean normal vectors with covariances and is a compressed version of which must maintain a given value of mutual information with . It can be shown that the optimum is a normal vector consisting of linear combinations of the elements of where matrix has orthogonal rows. The projection matrix in fact contains rows selected from the weighted left eigenvectors of the singular value decomposition of the following matrix (generally asymmetric) : Define the singular value decomposition : and the critical values : then the number of active eigenvectors in the projection, or order of approximation, is given by : And we finally get : In which the weights are given by : where Applying the Gaussian information bottleneck on time series, one gets optimal predictive coding. This procedure is formally equivalent to linear Slow Feature Analysis. Optimal temporal structures in linear dynamic systems can be revealed in the so-called past-future information bottleneck, an application of the bottleneck method to non-Gaussian sampled data. The concept, as treated by Creutzig, Tishby et al., is not without complication as there are two independent phases in the exercise: firstly estimation of the unknown parent probability densities from which the data samples are drawn and secondly the use of these densities within the information theoretic framework of the bottleneck. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Information bottleneck method」の詳細全文を読む スポンサード リンク
|